Skip to content

chore: fix flaky langchain tests#1584

Merged
hassiebp merged 2 commits intomainfrom
fix-ci
Mar 27, 2026
Merged

chore: fix flaky langchain tests#1584
hassiebp merged 2 commits intomainfrom
fix-ci

Conversation

@hassiebp
Copy link
Copy Markdown
Contributor

@hassiebp hassiebp commented Mar 27, 2026

Disclaimer: Experimental PR review

Greptile Summary

This PR updates assertion counts in tests/test_langchain.py to reflect an increase in the number of observations/generations produced per LangChain run — likely the result of a dependency version bump (LangChain, LangGraph, or OpenAI SDK) that causes the CallbackHandler to capture additional span or generation events.\n\nKey changes:\n- Six assert len(...) calls are raised across test_callback_generated_from_trace_chat, test_openai_instruct_usage, test_link_langfuse_prompts_invoke, test_link_langfuse_prompts_stream, test_link_langfuse_prompts_batch, and test_multimodal.\n- The new counts are: observations 2→3 (simple chat/multimodal), 3→4 (instruct batch), generations 2→4 (single-run prompt tests), and 6→10 (batch-of-3 prompt test).\n- Three imports (StrOutputParser, Runnable, OpenAI) remain inside the test_openai_instruct_usage function body instead of being moved to the top of the module.\n- The newly counted observations/generations in most tests are not verified for content, type, or prompt linkage, leaving it unclear whether they represent expected new spans or unintended artefacts.

Confidence Score: 5/5

Test-only change; safe to merge — all remaining findings are non-blocking style/documentation suggestions.

All findings are P2: one style violation (inline imports), one test-completeness gap (unverified extra observations), and one documentation gap (unexplained batch count). None block correctness or production behaviour.

tests/test_langchain.py — inline imports at lines 254-256 and unverified extra observations across several test functions.

Important Files Changed

Filename Overview
tests/test_langchain.py Updates six observation/generation count assertions to reflect new behavior producing more spans per LangChain run; contains inline imports inside a test function and leaves newly-added observations unverified.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[LangChain chain.invoke / stream / batch] --> B[CallbackHandler captures events]
    B --> C{Observation Type?}
    C -->|GENERATION| D[LLM call observation e.g. ChatOpenAI, OpenAI]
    C -->|SPAN| E[Wrapping / chain span]
    D --> F[Langfuse trace]
    E --> F
    F --> G[API: trace.observations]
    G --> H{Count assertions}
    H -->|invoke / stream| I[expect 4 generations, was 2]
    H -->|batch x3| J[expect 10 generations, was 6]
    H -->|simple chat| K[expect 3 observations, was 2]
    H -->|instruct batch x2| L[expect 4 observations, was 3]
Loading

Reviews (1): Last reviewed commit: "push" | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

Context used:

  • Rule used - Move imports to the top of the module instead of p... (source)

Learnt From
langfuse/langfuse-python#1387

@github-actions
Copy link
Copy Markdown

@claude review

@hassiebp hassiebp changed the title push chore: fix flaky langchain tests Mar 27, 2026
@hassiebp hassiebp enabled auto-merge (squash) March 27, 2026 14:28
@hassiebp hassiebp merged commit 840cf2a into main Mar 27, 2026
13 checks passed
@hassiebp hassiebp deleted the fix-ci branch March 27, 2026 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant